Skip to content

Multiple input files support#6

Open
mariusmilea wants to merge 2 commits intogrisha:masterfrom
mariusmilea:multiple_input_files_support
Open

Multiple input files support#6
mariusmilea wants to merge 2 commits intogrisha:masterfrom
mariusmilea:multiple_input_files_support

Conversation

@mariusmilea
Copy link
Copy Markdown

We're heavily using this tool to convert a couple of GBs of JSON files into AVRO every day.
It was useful for me to have this tool to accept more JSON files as input, hence my commit here.
If you need to convert a batch of json files, originally, json2avro could only be used like this:

cat file1.json file2.json file3.json | json2avro -S schema_file output.avro

With this patch, json2avro can also be used like this:

json2avro -S schema_files file1.json file2.json file3.json output.avro

eliminating thus the cat utility or any other utility used to concatenate the input files.
The performance improvement is between 1 and 1.5 seconds for a batch of 160MB of JSON files, when running json2avro with multiple input files.

@mariusmilea mariusmilea force-pushed the multiple_input_files_support branch from d6953b0 to 1732321 Compare December 3, 2014 14:06
@grisha
Copy link
Copy Markdown
Owner

grisha commented Jan 3, 2016

@spil-marius Sorry - I somehow never saw this pull request until now. How has this been working for you, do you think this is ok to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants